Digital Humanities Project 1, Team 10

1. Names and group leader

Mukesh (Leader)

Amola

Raunak

Hitesh

3. Corpus Description – What corpus did you create and why

The corpus this research is based on consists of novels by H.G. Wells and Jules Verne. Both these authors wrote science fiction in the late 19th century and early 20th century and have often been referred to as the fathers of science fiction. The corpus containing Wells’ novels has 22 documents with 1,743,941 total words and 39,231 unique word forms. The corpus containing Verne’s novels has 20 documents with 1,734,737 total words and 33,672 unique word forms. These two authors were chosen to allow a comparison between the work of earlier prominent science fiction writers.

4. Summary Paragraph

Broadly, Verne’s vocabulary usage is very coherent with time, compared to Wells.

In HG Wells’ corpus, vocabulary density is relatively higher for earlier text compared to the later. H. G. Wells wrote some of his shortest novels during the early phase of his writing career, with the length of the corpus increasing on average with works that were published later in time. The average number of words per sentence also tends to increase with works published later in time, with lower words per sentence featuring in his earlier works. While the vocabulary density in Wells’ works peaked in the earliest part of his writing career (with two of the earliest published books: The Time Machine and The Red Room), the vocabulary density was much more mixed with the rest of his works, and did not get as high as in the beginning.

Wells’ corpus also contains 5572 more unique word-forms than Verne, however this could be attributed to the fact that Verne largely wrote in French and the books in this corpus feature many translated versions of Verne’s works.

In contrast to Wells, Jules Verne’s works tended to be longer in the beginning of his career as the document length for documents published earlier are higher than those from later on. The vocabulary density of his works are more mixed. The work with the highest vocabulary density, almost double that of the next densest work (In the Year 2889), is rumoured to have been written by Jules Verne’s son, Michel Verne, but published under Jules’ name. The four works with the highest average words per sentence were all published between 1875 to 1881, while works with the lowest words per sentence are from a variety of time periods.

5. Frequency Comparison

a. Corpus 
b. Individual words or word clusters

What does the word frequency analysis tell you about the possible comparisons or differences between the documents?

  • Both corpora show references to themes of technology and its uses, however these mentions vary in frequency (approx. 637 for Verne and 1287 for Wells).

  • Both corpora show references to theme of war and violence, with differences in frequency

  • In Wells’ corpus, keywords for war and violence appear frequently throughout. The keywords used to measure this are as follows: (include table for Wells - war, violence etc)

new-img

In Wells’ corpus, keywords for technology and inventions appear mostly frequently throughout. They appear almost congruent (in the chart) to the mentions of war and violence i.e. when violence mentions are high, mentions of technology are high as well. The keywords used to measure this are as follows: (include table for Wells - machines, inventions etc)

table2

In Verne’s corpus, keywords for technology and inventions appear somewhat frequently throughout. They tend to feature most frequently towards the end of Verne’s writing career. The keywords used to measure this are as follows: (include table for Verne - machines, inventions etc)

table3

In Verne’s corpus, keywords for war and violence appear somewhat frequently, throughout. The keywords used to measure this are as follows: (include table for Verne - war, violence etc)

table4

In Verne’s corpus, keywords for travel and transportation appear frequently throughout. The words to measure this are as follows: (include table for Verne - travel, journey, ship etc)

table5

These frequencies lead us to believe that H. G. Wells and Verne write starkly contrasting novels. More specifically, Wells writes about violent uses of technology and machines more than Verne. Another difference we see arising is that Verne very often writes about voyages and journeying. In addition to this, the absence of mentions of technology lead us to believe that Verne wrote less about “futuristic” inventions as compared to Wells. So while they’re both understood to be the fathers of science fiction, their novels tackle significantly different elements. This goes against the portion of our hypothesis that asserts that Verne writes about the use of technology for travelling and voyages. However, the frequency data does seem corroborate that Verne doesn’t frequently write about violence and technology.

What does “drilling down” on one word or word cluster tell you about possible comparisons between documents?

For our project, we “drilled down” on the keyword pipe for technology and inventions. The motivation for this stemmed from the fact that our hypothesis seeks to establish a common theme (technology) across two authors, but to subsequently show that they depict that common theme in diverging ways. To reiterate, we sought to show that H.G. Wells portrayed technology in conjunction with violence, and that Verne described technology in combination with voyages and journeying. Our “drilling down” gave us a few potential points for comparison across the works of these two authors. As mentioned earlier, Wells seems to mention keywords associated with technology and inventions more frequently in his works than Verne does. The “drill down” feature divides all texts in a corpus into 10 equal sections. Supposing we divide the length (on the x axis) into three equal sections that indicate the beginning, middle, and end of the texts, we see that Wells mentions technology frequently in the first third and the last third of his novels. As the graph shows, this isn’t conclusively the case, given the very apparent presence of a few peaks appearing in the middle section (Anticipations, 1901 and The Time Machine, 1895). In the case of Verne, we see that though he mentions technology infrequently, when he does mention it – it seems to appear most frequently in the first third and then again towards the end in the last third. However, there’s a key issue with this as there is clearly an outlier in the middle section (The Master of the World, 1904). So while we would be keen to establish a strong parallel between the authors and the segments in which they mention technology, it seems that it is mostly a mild trend.

H.G. Wells “drill down” on pipe - machine|mechan|invent|device|engine*

img1

6. Collocation/Correlation

  1. Corpus
  2. Specific words/clusters

What does the collocates’ chart tell you about the corpus?

In general, since the count of collocates is very less for most of the terms, the efficacy of drawing any conclusion from the collocates’ chart is low lying. Among all the terms from the three domains (tech, torment and travel), the highest count that relate with the hypothesis is ‘war*-world (39)’.

The rest of the relevant/usable collocates have counts less than 39. Wells and Verne used technical terms in different ways. Wells had taken it as a theme of discovery and inventions, however Verne talked more about machinery and the concepts behind particular technology. It is evident from collocates of ‘mechan’ has words like apparatus, hydrography etc. in Verne’s corpurus, and development, progress etc. in Wells’ corpurus. It contradicts the hypothesis, since Wells seemed to represent technical innovation as betterment in this case. However, collocates of other tech terms ‘machine’ and ‘invent’, includes gun, war, horror etc. in Wells’ corpus. It indicates that he was trying to hint towards possible negative outcomes of science advancement.

In Wells’ corpus, it is tough to support the hypothesis via collocates of terms like ‘kill*’, ‘war’ etc. since, except collocates of ‘bomb’ that includes machine, factory and airplane, the torment terms do not collocate with technical words significantly.

H.G. Wells’ novels in the corpus do not involve travelling as much as Verne’s do. Moreover, his texts are more aligned towards time travel, collocates of terms ‘travel*’, ‘disco’, even with significantly less count compared to Jules Verne corpurus, have time at top, and also have words like- velocity, speed. Whereas Jules Verne corpus covers travelling and journeys significantly, especially travel by ships. One may find the word ‘captain’ that appears in collocates and correlations quite prominently.

tn1

t2

t3

t4

t5

t7


t8

t9

t10

t11

t12

t13

t14

t15

t16

t17

t18

What does “drilling down” into the collocates/correlation tell you about the corpus?

When drilling down into the collocates for the term “machin*” in the Jules Verne corpus, we get “air” as the most prominent term. “Air” appears to be a relatively common feature, and has a significant relative frequency throughout the Verne corpus. As such, we can infer that machines that have something to do with air, probably flying machines are prevalent in Verne’s works, which might classify as Science Fiction as the first plane by the Wright Brothers took off in December 1903 while the last work in this corpus (The Master of the World) was published in 1904. Interestingly, the term “aeroplane” only appears in the books Robur the Conqueror, 1886 and The Castaways of the Flag, 1900 while the term “airship” appears almost exclusively in The Master of the World, 1904.

Furthermore, in HG Wells’ corpus, the term “machin” has “new” as a prominent collocate. “New” appears a fair few times throughout the corpus, but most significantly in the middle-to-late time periods of his published works, somewhat ironically. “New machines” may refer to inventions for instance. The term “invention*” has a relatively low frequency of appearance in Wells’ corpus, however, but there is a similarity in the Trend with that of the Trend of relative frequencies of the term “new”.

7. Third Tool

  1. Explanation, Motivation
  2. Result

For the purpose of this project, we used bubblelines as our third tool of analysis. Our motivation for this lies in wanting to strengthen the connection between the mentions of technology and violence in Wells’ work, and technology and travel in Verne’s work. While we have observed high frequencies for technology and violence in Wells, bubblelines would show their frequency across the length of a text. Therefore, it would show whether the mentions of technology and violence appear together, which would strengthen or weaken the proposition that they are intertwined themes in the novels. Similarly for Verne, it would show whether the places where travel is mentioned frequently also mention technology. In the case of Wells, the mentions of violence seem to mostly coincide with the mentions of technology, leading us to infer that the two are mentioned together and in similar areas of the texts. In the case of Verne, the mentions of violence seem to somewhat frequently coincide with the mentions of technology, but there are longer sections where technology isn’t mentioned in proximity to violence. This leads us to infer that in Verne’s writing, violence and technology are sometimes written in similar sections of a novel, however they are often referred to independent of one another. Lastly, for Verne, we looked at the bubblelines for technology and travel and found that the mentions of travel and technology do seem to coincide through the length of a novel. However the mentions of travel heavily outweigh the mentions of technology which make the sample size of technology relatively less reliable to make generalisations about

img2

img3

8. Conclusion

The turn of the 20th century was a time rife with change and advancement. Automobiles, machinery, even flying transport were some features of an era of technological development. Two prolific authors wrote the bulk of their work in this time, works which were both incredibly popular as well as massively influential, causing the two authors to be oft referred to as the “Fathers of Science Fiction”. While HG Wells talked about fictional inventions such as an invisibility serum or a time machine and talked about how these devices could lead to harm, Verne discovered how travel could be made possible to far-off or inaccessible region such as the Centre of the Earth, or under the sea. Both authors even talked about flying transport and how aeroplanes (which were not invented as yet) could be used for war. As such, in this paper, how each author portrays and then reacts to such fictional inventions and technology has been examined.

It has been observed through the analysis in the paper that Wells’ works feature more references to war and violence, and even that there is a relationship between violence and technology as spikes in mentions of violence coincide with mentions of technology. This is further supported by overlapping of the two terms, as discovered with the help of the Bubblelines tool. Travel and journeys are not a distinct feature of Wells’ corpus, and there is no conclusive relationship that can be drawn between them, as the frequency of mentions to travel are low (lower that Verne’s corpus). As such, Wells has a strong focus on technology and violent themes which are present in many, if not most of the texts in his corpus.

Jules Verne’s works in the corpus do not contain as conclusive a relation between violence and technology or even between technology and travel. However, an overlap between violence and travel has been observed when using the Bubblelines tool. Travel is a theme that’s prevalent throughout the Verne corpus, with technology being prominent in some works, but not all of them. Violence as a theme is also present in a few works, but it is not clear if this is in relation or in reaction to the introduction of some technology or invention in Verne’s corpus.

From the paper, a conclusion can be drawn that Wells tends to react more negatively to new technology, while Verne has a more mixed reaction.

9. Reflection

The study ran into many limitations! First, the corpus contains 20 texts from each of Wells’ and Verne’s bibliographies, but this is not a reflection of their complete works. As such, a few works with spikes in mentions of violence, inventions or violence will bias the trends quite significantly. Outliers are the biggest caveat with this study, especially as Verne wrote in French and many of the books in his corpus were translated meaning some of his original meaning and intent were lost. Secondly, Voyant was also quite difficult to work out and not very intuitive considering all the group members were using it for the first time. Perhaps some tools were not optimised for the best use here, and potentially some tools were misunderstood entirely. Cleaning the data for uploading to Voyant also took a lot of time as Voyant’s cleaning option is not effective much. Creating the pipeline for the terms used for frequency tool also might have missed out on some terms that would work better for the intended meaning.

Regardless, the project was extremely illuminating. Both these authors are greatly influential, and seeing how they might have paved the way for future science fiction was very interesting: how some terms that were fictional (aeroplane for example) became non-fiction for instance. The evolution of the authors and the change in their writing style was also another point of great interest and might be a topic of further study.

As such, further research into this hypothesis might bear additional or even contradictory information. What would happen if other terms were used? What would happen if the corpus was expanded? What would happen if another “Father of Science Fiction”, Hugo Gernsback, was used to compare Wells and Verne? Hopefully, we would have opportunity, expertise (and the time!) to delve further into this field.

For Appendix check this